Data visualization
Variable assignment
Throughout the exercises in this chapter, you’ll be visualizing a subset of the gapminder data from the year 1952. First, you’ll have to load the ggplot2 package, and create a gapminder_1952 dataset to visualize.
# Load the knitr and kableExtra packages
library(knitr)
library(kableExtra)
options(knitr.table.format = "html")
# Load the gapminder package
library(gapminder)
# Load the dpylr package
library(dplyr)
# Load the ggplot2 package as well
library(ggplot2)
theme_set(theme_bw()) # pre-set the bw theme.# Create gapminder_1952
gapminder_1952 <- gapminder %>%
filter(year == 1952)# Look at the gapminder_1952 dataset
gapminder_1952 %>%
kable(caption = "Gapminder from 1952") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = T, position = "left", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689") %>%
scroll_box(width = "100%", height = "300px")| country | continent | year | lifeExp | pop | gdpPercap |
|---|---|---|---|---|---|
| Afghanistan | Asia | 1952 | 28.801 | 8425333 | 779.4453 |
| Albania | Europe | 1952 | 55.230 | 1282697 | 1601.0561 |
| Algeria | Africa | 1952 | 43.077 | 9279525 | 2449.0082 |
| Angola | Africa | 1952 | 30.015 | 4232095 | 3520.6103 |
| Argentina | Americas | 1952 | 62.485 | 17876956 | 5911.3151 |
| Australia | Oceania | 1952 | 69.120 | 8691212 | 10039.5956 |
| Austria | Europe | 1952 | 66.800 | 6927772 | 6137.0765 |
| Bahrain | Asia | 1952 | 50.939 | 120447 | 9867.0848 |
| Bangladesh | Asia | 1952 | 37.484 | 46886859 | 684.2442 |
| Belgium | Europe | 1952 | 68.000 | 8730405 | 8343.1051 |
| Benin | Africa | 1952 | 38.223 | 1738315 | 1062.7522 |
| Bolivia | Americas | 1952 | 40.414 | 2883315 | 2677.3263 |
| Bosnia and Herzegovina | Europe | 1952 | 53.820 | 2791000 | 973.5332 |
| Botswana | Africa | 1952 | 47.622 | 442308 | 851.2411 |
| Brazil | Americas | 1952 | 50.917 | 56602560 | 2108.9444 |
| Bulgaria | Europe | 1952 | 59.600 | 7274900 | 2444.2866 |
| Burkina Faso | Africa | 1952 | 31.975 | 4469979 | 543.2552 |
| Burundi | Africa | 1952 | 39.031 | 2445618 | 339.2965 |
| Cambodia | Asia | 1952 | 39.417 | 4693836 | 368.4693 |
| Cameroon | Africa | 1952 | 38.523 | 5009067 | 1172.6677 |
| Canada | Americas | 1952 | 68.750 | 14785584 | 11367.1611 |
| Central African Republic | Africa | 1952 | 35.463 | 1291695 | 1071.3107 |
| Chad | Africa | 1952 | 38.092 | 2682462 | 1178.6659 |
| Chile | Americas | 1952 | 54.745 | 6377619 | 3939.9788 |
| China | Asia | 1952 | 44.000 | 556263527 | 400.4486 |
| Colombia | Americas | 1952 | 50.643 | 12350771 | 2144.1151 |
| Comoros | Africa | 1952 | 40.715 | 153936 | 1102.9909 |
| Congo, Dem. Rep. | Africa | 1952 | 39.143 | 14100005 | 780.5423 |
| Congo, Rep. | Africa | 1952 | 42.111 | 854885 | 2125.6214 |
| Costa Rica | Americas | 1952 | 57.206 | 926317 | 2627.0095 |
| Cote d’Ivoire | Africa | 1952 | 40.477 | 2977019 | 1388.5947 |
| Croatia | Europe | 1952 | 61.210 | 3882229 | 3119.2365 |
| Cuba | Americas | 1952 | 59.421 | 6007797 | 5586.5388 |
| Czech Republic | Europe | 1952 | 66.870 | 9125183 | 6876.1403 |
| Denmark | Europe | 1952 | 70.780 | 4334000 | 9692.3852 |
| Djibouti | Africa | 1952 | 34.812 | 63149 | 2669.5295 |
| Dominican Republic | Americas | 1952 | 45.928 | 2491346 | 1397.7171 |
| Ecuador | Americas | 1952 | 48.357 | 3548753 | 3522.1107 |
| Egypt | Africa | 1952 | 41.893 | 22223309 | 1418.8224 |
| El Salvador | Americas | 1952 | 45.262 | 2042865 | 3048.3029 |
| Equatorial Guinea | Africa | 1952 | 34.482 | 216964 | 375.6431 |
| Eritrea | Africa | 1952 | 35.928 | 1438760 | 328.9406 |
| Ethiopia | Africa | 1952 | 34.078 | 20860941 | 362.1463 |
| Finland | Europe | 1952 | 66.550 | 4090500 | 6424.5191 |
| France | Europe | 1952 | 67.410 | 42459667 | 7029.8093 |
| Gabon | Africa | 1952 | 37.003 | 420702 | 4293.4765 |
| Gambia | Africa | 1952 | 30.000 | 284320 | 485.2307 |
| Germany | Europe | 1952 | 67.500 | 69145952 | 7144.1144 |
| Ghana | Africa | 1952 | 43.149 | 5581001 | 911.2989 |
| Greece | Europe | 1952 | 65.860 | 7733250 | 3530.6901 |
| Guatemala | Americas | 1952 | 42.023 | 3146381 | 2428.2378 |
| Guinea | Africa | 1952 | 33.609 | 2664249 | 510.1965 |
| Guinea-Bissau | Africa | 1952 | 32.500 | 580653 | 299.8503 |
| Haiti | Americas | 1952 | 37.579 | 3201488 | 1840.3669 |
| Honduras | Americas | 1952 | 41.912 | 1517453 | 2194.9262 |
| Hong Kong, China | Asia | 1952 | 60.960 | 2125900 | 3054.4212 |
| Hungary | Europe | 1952 | 64.030 | 9504000 | 5263.6738 |
| Iceland | Europe | 1952 | 72.490 | 147962 | 7267.6884 |
| India | Asia | 1952 | 37.373 | 372000000 | 546.5657 |
| Indonesia | Asia | 1952 | 37.468 | 82052000 | 749.6817 |
| Iran | Asia | 1952 | 44.869 | 17272000 | 3035.3260 |
| Iraq | Asia | 1952 | 45.320 | 5441766 | 4129.7661 |
| Ireland | Europe | 1952 | 66.910 | 2952156 | 5210.2803 |
| Israel | Asia | 1952 | 65.390 | 1620914 | 4086.5221 |
| Italy | Europe | 1952 | 65.940 | 47666000 | 4931.4042 |
| Jamaica | Americas | 1952 | 58.530 | 1426095 | 2898.5309 |
| Japan | Asia | 1952 | 63.030 | 86459025 | 3216.9563 |
| Jordan | Asia | 1952 | 43.158 | 607914 | 1546.9078 |
| Kenya | Africa | 1952 | 42.270 | 6464046 | 853.5409 |
| Korea, Dem. Rep. | Asia | 1952 | 50.056 | 8865488 | 1088.2778 |
| Korea, Rep. | Asia | 1952 | 47.453 | 20947571 | 1030.5922 |
| Kuwait | Asia | 1952 | 55.565 | 160000 | 108382.3529 |
| Lebanon | Asia | 1952 | 55.928 | 1439529 | 4834.8041 |
| Lesotho | Africa | 1952 | 42.138 | 748747 | 298.8462 |
| Liberia | Africa | 1952 | 38.480 | 863308 | 575.5730 |
| Libya | Africa | 1952 | 42.723 | 1019729 | 2387.5481 |
| Madagascar | Africa | 1952 | 36.681 | 4762912 | 1443.0117 |
| Malawi | Africa | 1952 | 36.256 | 2917802 | 369.1651 |
| Malaysia | Asia | 1952 | 48.463 | 6748378 | 1831.1329 |
| Mali | Africa | 1952 | 33.685 | 3838168 | 452.3370 |
| Mauritania | Africa | 1952 | 40.543 | 1022556 | 743.1159 |
| Mauritius | Africa | 1952 | 50.986 | 516556 | 1967.9557 |
| Mexico | Americas | 1952 | 50.789 | 30144317 | 3478.1255 |
| Mongolia | Asia | 1952 | 42.244 | 800663 | 786.5669 |
| Montenegro | Europe | 1952 | 59.164 | 413834 | 2647.5856 |
| Morocco | Africa | 1952 | 42.873 | 9939217 | 1688.2036 |
| Mozambique | Africa | 1952 | 31.286 | 6446316 | 468.5260 |
| Myanmar | Asia | 1952 | 36.319 | 20092996 | 331.0000 |
| Namibia | Africa | 1952 | 41.725 | 485831 | 2423.7804 |
| Nepal | Asia | 1952 | 36.157 | 9182536 | 545.8657 |
| Netherlands | Europe | 1952 | 72.130 | 10381988 | 8941.5719 |
| New Zealand | Oceania | 1952 | 69.390 | 1994794 | 10556.5757 |
| Nicaragua | Americas | 1952 | 42.314 | 1165790 | 3112.3639 |
| Niger | Africa | 1952 | 37.444 | 3379468 | 761.8794 |
| Nigeria | Africa | 1952 | 36.324 | 33119096 | 1077.2819 |
| Norway | Europe | 1952 | 72.670 | 3327728 | 10095.4217 |
| Oman | Asia | 1952 | 37.578 | 507833 | 1828.2303 |
| Pakistan | Asia | 1952 | 43.436 | 41346560 | 684.5971 |
| Panama | Americas | 1952 | 55.191 | 940080 | 2480.3803 |
| Paraguay | Americas | 1952 | 62.649 | 1555876 | 1952.3087 |
| Peru | Americas | 1952 | 43.902 | 8025700 | 3758.5234 |
| Philippines | Asia | 1952 | 47.752 | 22438691 | 1272.8810 |
| Poland | Europe | 1952 | 61.310 | 25730551 | 4029.3297 |
| Portugal | Europe | 1952 | 59.820 | 8526050 | 3068.3199 |
| Puerto Rico | Americas | 1952 | 64.280 | 2227000 | 3081.9598 |
| Reunion | Africa | 1952 | 52.724 | 257700 | 2718.8853 |
| Romania | Europe | 1952 | 61.050 | 16630000 | 3144.6132 |
| Rwanda | Africa | 1952 | 40.000 | 2534927 | 493.3239 |
| Sao Tome and Principe | Africa | 1952 | 46.471 | 60011 | 879.5836 |
| Saudi Arabia | Asia | 1952 | 39.875 | 4005677 | 6459.5548 |
| Senegal | Africa | 1952 | 37.278 | 2755589 | 1450.3570 |
| Serbia | Europe | 1952 | 57.996 | 6860147 | 3581.4594 |
| Sierra Leone | Africa | 1952 | 30.331 | 2143249 | 879.7877 |
| Singapore | Asia | 1952 | 60.396 | 1127000 | 2315.1382 |
| Slovak Republic | Europe | 1952 | 64.360 | 3558137 | 5074.6591 |
| Slovenia | Europe | 1952 | 65.570 | 1489518 | 4215.0417 |
| Somalia | Africa | 1952 | 32.978 | 2526994 | 1135.7498 |
| South Africa | Africa | 1952 | 45.009 | 14264935 | 4725.2955 |
| Spain | Europe | 1952 | 64.940 | 28549870 | 3834.0347 |
| Sri Lanka | Asia | 1952 | 57.593 | 7982342 | 1083.5320 |
| Sudan | Africa | 1952 | 38.635 | 8504667 | 1615.9911 |
| Swaziland | Africa | 1952 | 41.407 | 290243 | 1148.3766 |
| Sweden | Europe | 1952 | 71.860 | 7124673 | 8527.8447 |
| Switzerland | Europe | 1952 | 69.620 | 4815000 | 14734.2327 |
| Syria | Asia | 1952 | 45.883 | 3661549 | 1643.4854 |
| Taiwan | Asia | 1952 | 58.500 | 8550362 | 1206.9479 |
| Tanzania | Africa | 1952 | 41.215 | 8322925 | 716.6501 |
| Thailand | Asia | 1952 | 50.848 | 21289402 | 757.7974 |
| Togo | Africa | 1952 | 38.596 | 1219113 | 859.8087 |
| Trinidad and Tobago | Americas | 1952 | 59.100 | 662850 | 3023.2719 |
| Tunisia | Africa | 1952 | 44.600 | 3647735 | 1468.4756 |
| Turkey | Europe | 1952 | 43.585 | 22235677 | 1969.1010 |
| Uganda | Africa | 1952 | 39.978 | 5824797 | 734.7535 |
| United Kingdom | Europe | 1952 | 69.180 | 50430000 | 9979.5085 |
| United States | Americas | 1952 | 68.440 | 157553000 | 13990.4821 |
| Uruguay | Americas | 1952 | 66.071 | 2252965 | 5716.7667 |
| Venezuela | Americas | 1952 | 55.088 | 5439568 | 7689.7998 |
| Vietnam | Asia | 1952 | 40.412 | 26246839 | 605.0665 |
| West Bank and Gaza | Asia | 1952 | 43.160 | 1030585 | 1515.5923 |
| Yemen, Rep. | Asia | 1952 | 32.548 | 4963829 | 781.7176 |
| Zambia | Africa | 1952 | 42.038 | 2672000 | 1147.3888 |
| Zimbabwe | Africa | 1952 | 48.451 | 3080907 | 406.8841 |
Comparing population and GDP per capita
In the video you learned to create a scatter plot with GDP per capita on the x-axis and life expectancy on the y-axis (the code for that graph is shown here). When you’re exploring data visually, you’ll often need to try different combinations of variables and aesthetics.
ggplot(gapminder_1952, aes(x = pop, y = gdpPercap)) +
geom_point() +
geom_smooth(method="loess", se=F) +
labs(subtitle="GDP by capita by population",
y="GDP per capita",
x="Population",
title="Scatterplot",
caption = "")Each point represents a country: can you guess which country any of the points are?
Comparing population and life expectancy
In this exercise, you’ll use ggplot2 to create a scatter plot from scratch, to compare each country’s population with its life expectancy in the year 1952.
# Create a scatter plot with pop on the x-axis and lifeExp on the y-axis
ggplot(gapminder_1952, aes(x = pop, y = lifeExp)) +
geom_point()+
geom_smooth(method="loess", se=F) +
labs(subtitle="Country's population with its life expectancy in the year 1952",
y="Life Expectancy",
x="Population",
title="Scatterplot",
caption = "")You might notice the points are crowded towards the left side of the plot, making them hard to distinguish.
Putting the x-axis on a log scale
You previously created a scatter plot with population on the x-axis and life expectancy on the y-axis. Since population is spread over several orders of magnitude, with some countries having a much higher population than others, it’s a good idea to put the x-axis on a log scale.
# Change this plot to put the x-axis on a log scale
ggplot(gapminder_1952, aes(x = pop, y = lifeExp)) +
geom_point()+
scale_x_log10() +
geom_smooth(method="loess", se=F) +
labs(subtitle="Country's population (passed into log scale) with its life expectancy in the year 1952",
y="Life Expectancy",
x="Population",
title="Scatterplot",
caption = "")Notice the points are more spread out on the x-axis. This makes it easy to see that there isn’t a correlation between population and life expectancy.
Putting the x- and y- axes on a log scale
Suppose you want to create a scatter plot with population on the x-axis and GDP per capita on the y-axis. Both population and GDP per-capita are better represented with log scales, since they vary over many orders of magnitude.
# Scatter plot comparing pop and gdpPercap, with both axes on a log scale
ggplot(gapminder_1952, aes(x = pop, y = gdpPercap)) +
geom_point() +
scale_x_log10() +
scale_y_log10() +
geom_smooth(method="loess", se=F) +
labs(subtitle="Country's population (log scale) with GDP by capita (log scale) in the year 1952",
y="GDP by capita",
x="Population",
title="Scatterplot",
caption = "")Notice that the y-axis goes from 1e3 (1000) to 1e4 (10,000) to 1e5 (100,000) in equal increments.
Adding color to a scatter plot
In this lesson you learned how to use the color aesthetic (color and pop), which can be used to show which continent each point in a scatter plot represents.
# Scatter plot comparing pop and lifeExp, with color representing continent
ggplot(gapminder_1952, aes(x = pop, y = lifeExp, color = continent))+
geom_point() +
scale_x_log10() +
labs(subtitle="Country's population (log scale) with Life expectancy in the year 1952",
y="Life expectancy",
x="Population",
title="Scatterplot colored by continent",
caption = "")Adding size and color to a plot
In the last exercise, you created a scatter plot communicating information about each country’s population, life expectancy, and continent. Now you’ll use the size of the points to communicate even more.
# Add the size aesthetic to represent a country's gdpPercap
ggplot(gapminder_1952, aes(x = pop, y = lifeExp, color = continent, size = gdpPercap)) +
geom_point() +
scale_x_log10() +
labs(subtitle="Country's population (log scale) with Life expectancy in the year 1952",
y="Life expectancy",
x="Population",
title="Scatterplot colored by continent, size by GDB by capita",
caption = "")Creating a subgraph for each continent
You’ve learned to use faceting to divide a graph into subplots based on one of its variables, such as the continent.
# Scatter plot comparing pop and lifeExp, faceted by continent
ggplot(gapminder_1952, aes(x = pop, y =lifeExp)) +
geom_point() +
scale_x_log10() +
facet_wrap(~ continent) +
labs(subtitle="Country's population (log scale) with Life expectancy in the year 1952 by Continent",
y="Life expectancy",
x="Population",
title="Scatterplot of each continent",
caption = "")Faceting is a powerful way to understand subsets of your data separately.
Faceting by year
All of the graphs in this chapter have been visualizing statistics within one year. Now that you’re able to use faceting, however, you can create a graph showing all the country-level data from 1952 to 2007, to understand how global statistics have changed over time.
# Scatter plot comparing gdpPercap and lifeExp, with color representing continent
# and size representing population, faceted by year
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
geom_point() +
scale_x_log10() +
facet_wrap(~ year) +
labs(subtitle="GDB per capita (log scale) with Life expectancy by Continent and size population",
y="Life expectancy",
x="GDP per Capita",
title="Scatterplot, every 5 years from 1952 to 2007",
caption = "")